Educating Lia : The Development of a Linguistically Accurate Memory-Based Lemmatiser for Afrikaans

نویسنده

  • Hendrik J. Groenewald
چکیده

This paper describes the development of a memory-based lemmatiser for Afrikaans called Lia. The paper commences with a brief overview of Afrikaans lemmatisation and it is indicated that lemmatisation is seen as a simplified process of morphological analysis within the context of this paper. This overview is followed by an introduction to memory-based learning – the machine learning technique that is used in the development of the Afrikaans lemmatiser. The deployment of Lia is then discussed with specific emphasis on the format of the training and testing data that is used. The Afrikaans lemmatiser is then evaluated and it is indicated that Lia achieves a linguistic accuracy figure of over 90%. The paper concludes with some ideas on future work that can be done to improve the linguistic accuracy of the Afrikaans lemmatiser.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Technology Transfer to Advance Automatic Lemmatisation for Setswana

South African languages (and indigenous African languages in general) lag behind other languages in terms of the availability of linguistic resources. Efforts to improve or fasttrack the development of linguistic resources are required to bridge this ever-increasing gap. In this paper we emphasize the advantages of technology transfer between two languages to advance an existing linguistic tech...

متن کامل

The AnIta-Lemmatiser: A Tool for Accurate Lemmatisation of Italian Texts

This paper presents the AnIta-Lemmatiser, an automatic tool to lemmatise Italian texts. It is based on a powerful morphological analyser enriched with a large lexicon and some heuristic techniques to select the most appropriate lemma among those that can be morphologically associated to an ambiguous wordform. The heuristics are essentially based on the frequency-of-use tags provided by the De M...

متن کامل

FPGA Implementation of a Hammerstein Based Digital Predistorter for Linearizing RF Power Amplifiers with Memory Effects

Power amplifiers (PAs) are inherently nonlinear elements and digital predistortion is a highly cost-effective approach to linearize them. Although most existing architectures assume that the PA has a memoryless nonlinearity, memory effects of the PAs in many applications ,such as wideband code-division multiple access (WCDMA) or orthogonal frequency-division multiplexing (OFDM), can no longer b...

متن کامل

Chaotic Genetic Algorithm based on Explicit Memory with a new Strategy for Updating and Retrieval of Memory in Dynamic Environments

Many of the problems considered in optimization and learning assume that solutions exist in a dynamic. Hence, algorithms are required that dynamically adapt with the problem’s conditions and search new conditions. Mostly, utilization of information from the past allows to quickly adapting changes after. This is the idea underlining the use of memory in this field, what involves key design issue...

متن کامل

EUSLEM: A lemmatiser/tagger for Basque

This paper presents relevant issues that have been considered in the design and development of a general purpose lemmatiser/tagger for Basque (EUSLEM). The lemmatiser/tagger is conceived as a basic tool for other linguistic applications. It uses the lexical database and the morphological analyser previously developed and implemented. We will descr ibe the components used in the development of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006